Conversation
8a45f14 to
dbc1af6
Compare
src/main/java/com/google/devtools/build/lib/remote/chunking/FastCDCChunker.java
Show resolved
Hide resolved
src/main/java/com/google/devtools/build/lib/remote/FastCDCChunker.java
Outdated
Show resolved
Hide resolved
src/main/java/com/google/devtools/build/lib/remote/FastCDCChunker.java
Outdated
Show resolved
Hide resolved
src/main/java/com/google/devtools/build/lib/remote/chunking/FastCDCChunker.java
Outdated
Show resolved
Hide resolved
src/main/java/com/google/devtools/build/lib/remote/GrpcCacheClient.java
Outdated
Show resolved
Hide resolved
src/main/java/com/google/devtools/build/lib/remote/ChunkedBlobUploader.java
Outdated
Show resolved
Hide resolved
src/main/java/com/google/devtools/build/lib/remote/ChunkedBlobUploader.java
Outdated
Show resolved
Hide resolved
src/main/java/com/google/devtools/build/lib/remote/merkletree/MerkleTreeComputer.java
Outdated
Show resolved
Hide resolved
src/main/java/com/google/devtools/build/lib/remote/ChunkedBlobDownloader.java
Outdated
Show resolved
Hide resolved
src/main/java/com/google/devtools/build/lib/remote/ChunkedBlobDownloader.java
Outdated
Show resolved
Hide resolved
src/main/java/com/google/devtools/build/lib/remote/ChunkedBlobUploader.java
Outdated
Show resolved
Hide resolved
789ab23 to
4349030
Compare
src/main/java/com/google/devtools/build/lib/remote/chunking/FastCDCChunker.java
Outdated
Show resolved
Hide resolved
6e3f676 to
e795b34
Compare
e795b34 to
e02742a
Compare
dd3f8e9 to
20a1821
Compare
20a1821 to
32bb5db
Compare
src/main/java/com/google/devtools/build/lib/remote/ChunkedBlobUploader.java
Outdated
Show resolved
Hide resolved
src/main/java/com/google/devtools/build/lib/remote/ChunkedBlobUploader.java
Outdated
Show resolved
Hide resolved
32bb5db to
f528d14
Compare
There was a problem hiding this comment.
Code Review
This pull request introduces content-defined chunking (CDC) for remote caching, which is a significant feature for improving performance with large artifacts. The implementation across the client, server, and protocol seems well-structured. I've identified a critical memory issue in the remote worker's spliceBlob implementation that could lead to out-of-memory errors when handling large files. Additionally, there's a performance issue on the client-side uploader where files are read twice. Addressing these points will improve the robustness and efficiency of this new feature.
src/tools/remote/src/main/java/com/google/devtools/build/remote/worker/CasServer.java
Outdated
Show resolved
Hide resolved
src/main/java/com/google/devtools/build/lib/remote/ChunkedBlobUploader.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Pull request overview
This PR adds experimental remote cache content-defined chunking (CDC) support using FastCDC 2020, enabling large blobs to be uploaded/downloaded as reusable chunks via new SplitBlob / SpliceBlob RPCs (requires server capability support).
Changes:
- Updates REAPI protos to define
SplitBlob/SpliceBlob, chunking capabilities/params, and related logging protos. - Implements client-side chunking flow (chunking config discovery, chunk upload + splice registration, chunked download + reassembly) gated behind
--experimental_remote_cache_chunking. - Adds worker-side CAS support for the new RPCs plus unit/integration tests and a JMH benchmark.
Reviewed changes
Copilot reviewed 32 out of 32 changed files in this pull request and generated 14 comments.
Show a summary per file
| File | Description |
|---|---|
| third_party/remoteapis/build/bazel/remote/execution/v2/remote_execution.proto | Adds Split/Splice RPCs, chunking capabilities/params, and various REAPI comment fixes. |
| src/tools/remote/src/main/java/com/google/devtools/build/remote/worker/CasServer.java | Implements worker-side SplitBlob / SpliceBlob handling for integration testing. |
| src/tools/remote/src/main/java/com/google/devtools/build/remote/worker/CapabilitiesServer.java | Advertises Split/Splice + FastCDC params in cache capabilities. |
| src/main/protobuf/remote_execution_log.proto | Adds log detail messages for SplitBlob/SpliceBlob and wires them into the RPC details oneof. |
| src/main/java/com/google/devtools/build/lib/remote/util/DigestUtil.java | Adds partial-array digest computation for chunk hashing. |
| src/main/java/com/google/devtools/build/lib/remote/options/RemoteOptions.java | Adds --experimental_remote_cache_chunking flag. |
| src/main/java/com/google/devtools/build/lib/remote/logging/SplitBlobHandler.java | Adds logging handler for SplitBlob calls. |
| src/main/java/com/google/devtools/build/lib/remote/logging/SpliceBlobHandler.java | Adds logging handler for SpliceBlob calls. |
| src/main/java/com/google/devtools/build/lib/remote/logging/LoggingInterceptor.java | Routes SplitBlob/SpliceBlob RPCs to the new logging handlers. |
| src/main/java/com/google/devtools/build/lib/remote/common/RemoteCacheClient.java | Extends cache client interface with optional spliceBlob hook. |
| src/main/java/com/google/devtools/build/lib/remote/chunking/FastCDCChunker.java | Introduces FastCDC-based chunk boundary selection and chunk digesting. |
| src/main/java/com/google/devtools/build/lib/remote/chunking/ChunkingConfig.java | Introduces chunking configuration and server-capability-derived defaults. |
| src/main/java/com/google/devtools/build/lib/remote/chunking/BUILD | Adds BUILD target for the new chunking library. |
| src/main/java/com/google/devtools/build/lib/remote/RemoteServerCapabilities.java | Adds compatibility checks for chunking-related server capabilities. |
| src/main/java/com/google/devtools/build/lib/remote/RemoteModule.java | Discovers chunking config from server capabilities and passes it to the cache client. |
| src/main/java/com/google/devtools/build/lib/remote/GrpcCacheClient.java | Adds client implementations of SplitBlob / SpliceBlob RPC calls. |
| src/main/java/com/google/devtools/build/lib/remote/CombinedCache.java | Integrates chunked upload/download paths into CAS file/blob operations. |
| src/main/java/com/google/devtools/build/lib/remote/ChunkedBlobUploader.java | Implements chunked upload: chunk -> find-missing -> upload missing -> splice. |
| src/main/java/com/google/devtools/build/lib/remote/ChunkedBlobDownloader.java | Implements chunked download: split -> download chunks -> reassemble. |
| src/main/java/com/google/devtools/build/lib/remote/BUILD | Wires the new chunking sources into the main remote library build. |
| src/test/java/com/google/devtools/build/lib/remote/chunking/FastCDCChunkerTest.java | Unit tests for chunk boundary behavior and digest correctness. |
| src/test/java/com/google/devtools/build/lib/remote/chunking/ChunkingConfigTest.java | Unit tests for defaults and server capability parsing. |
| src/test/java/com/google/devtools/build/lib/remote/chunking/FastCDCBenchmark.java | Adds a JMH benchmark for chunking throughput. |
| src/test/java/com/google/devtools/build/lib/remote/chunking/BUILD | Adds test and benchmark targets for chunking tests. |
| src/test/java/com/google/devtools/build/lib/remote/ChunkedCacheIntegrationTest.java | Integration tests for remote-only chunking behavior via SplitBlob. |
| src/test/java/com/google/devtools/build/lib/remote/ChunkedDiskCacheIntegrationTest.java | Integration tests for chunking with disk cache capturing chunk blobs. |
| src/test/java/com/google/devtools/build/lib/remote/ChunkedBlobUploaderTest.java | Unit tests for chunked uploader behavior (missing-chunk selection and data correctness). |
| src/test/java/com/google/devtools/build/lib/remote/ChunkedBlobDownloaderTest.java | Unit tests for chunked downloader behavior (reassembly ordering and edge cases). |
| src/test/java/com/google/devtools/build/lib/remote/RemoteSpawnRunnerWithGrpcRemoteExecutorTest.java | Updates GrpcCacheClient construction to include the new chunkingConfig parameter. |
| src/test/java/com/google/devtools/build/lib/remote/GrpcCacheClientTest.java | Updates GrpcCacheClient construction to include the new chunkingConfig parameter. |
| src/test/java/com/google/devtools/build/lib/remote/ByteStreamBuildEventArtifactUploaderTest.java | Updates GrpcCacheClient construction to include the new chunkingConfig parameter. |
| src/test/java/com/google/devtools/build/lib/remote/BUILD | Registers new chunking-related integration tests and adds chunking sources/deps. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/main/java/com/google/devtools/build/lib/remote/RemoteServerCapabilities.java
Outdated
Show resolved
Hide resolved
src/main/java/com/google/devtools/build/lib/remote/chunking/FastCDCChunker.java
Show resolved
Hide resolved
src/main/java/com/google/devtools/build/lib/remote/logging/SpliceBlobHandler.java
Outdated
Show resolved
Hide resolved
src/main/java/com/google/devtools/build/lib/remote/logging/SplitBlobHandler.java
Outdated
Show resolved
Hide resolved
src/test/java/com/google/devtools/build/lib/remote/ChunkedBlobUploaderTest.java
Show resolved
Hide resolved
src/tools/remote/src/main/java/com/google/devtools/build/remote/worker/CasServer.java
Show resolved
Hide resolved
src/tools/remote/src/main/java/com/google/devtools/build/remote/worker/CasServer.java
Outdated
Show resolved
Hide resolved
src/main/java/com/google/devtools/build/lib/remote/chunking/FastCDCChunker.java
Outdated
Show resolved
Hide resolved
2f85328 to
51d119f
Compare
51d119f to
aaeb1b9
Compare
src/main/java/com/google/devtools/build/lib/remote/ChunkedBlobDownloader.java
Outdated
Show resolved
Hide resolved
src/main/java/com/google/devtools/build/lib/remote/ChunkedBlobDownloader.java
Outdated
Show resolved
Hide resolved
src/test/java/com/google/devtools/build/lib/remote/chunking/FastCDCChunkerTest.java
Outdated
Show resolved
Hide resolved
aaeb1b9 to
9d508a6
Compare
9d508a6 to
23b5f14
Compare
This is needed to run `bb print` to see Split/Splice calls, to test bazelbuild/bazel#28437 BB print just reads the grpc log directly from whats in this file, so updating this file is sufficient.
TLDR: This PR enables content-defined chunking (FastCDC) for large uploads/downloads to remote cache, saving ~40% storage, ~50% upload bandwidth, and making builds faster by deduplicating similar artifacts across builds.
RELNOTES[NEW]: Added
--experimental_remote_cache_chunkingflag to read and write large blobs to/from the remote cache in chunks. Requires server support.Motivation
Actions like
GoLinkandCppLinkproduce very large output files that are often similar between builds. A small source change can cause a cache miss, wasting storage, bandwidth, and time on nearly-identical artifacts.Content-Defined Chunking (CDC) addresses this by splitting files at content-determined cut points. Because cut points are derived from the file content itself, small changes, even ones that shift bytes around, tend to affect only a few chunks. This makes action outputs effectively incremental: even though the action must re-run, the upload, download, and storage costs shrink dramatically.
Results
Benchmarked across the last 50 commits of the BuildBuddy repo (server and client on the same host):
Key takeaways:
Additional benefits: better load balancing across distributed clusters (fewer long-running RPCs) and more granular retries on unstable networks.
How It Works
Write path:
FindMissingBlobsto identify which chunks the server already has.SpliceBlobto register the blob-to-chunks mapping on the server.Read path:
SplitBlobto get the chunk list for this blob.If
--disk_cacheis enabled, previously downloaded chunks are served locally.Dependencies
Depends on the APIs added in #28614